An A-Z guide to nctoolkit methods

This guide will provide examples of how to use almost every method available in nctoolkit. All example will be tested using two datasets. First, a NetCDF dataset of sea surface temperature from 1850 to the present day. Second, a NetCDF file showing... You can get these data files here.

For all examples we will import nctoolkit as follows:

In [1]:
import nctoolkit as nc
wget ftp://ftp.cdc.noaa.gov/Datasets/COBE2/sst.mon.mean.nc

add

This method can add to a dataset. You can add a constant, another dataset or a NetCDF file. In the case of datasets or NetCDF files the grids etc. must be of the same structure as the original dataset.

We can illustrate this by first converting the SST dataset and converting it from degrees Celsius to Kelvin.

In [2]:
data = nc.open_data("sst.mon.mean.nc")
data.add(273.15)
data.select_timestep(0)
data.plot()
Out[2]:

annual_anomaly

This method will calculate the annual anomaly compared with a baseline. This can be illustrated by first calculating global mean sea surface temperature and then calculating the anomaly compared with a baseline period of 1900-1919, to estimate how much global sea surface temperatures increased since the start of the 20th Century. And we will use a window of 20 years, so that the anomaly derived uses a rolling mean.

In [3]:
data = nc.open_data("sst.mon.mean.nc")
data.spatial_mean()
data.annual_anomaly(baseline=[1900, 1919], window=20)
data.plot()
Out[3]:

annual_max

This method will calculate the maximum value in each available year and for each grid cell of dataset.

In [4]:
data = nc.open_data("sst.mon.mean.nc")
data.annual_max()

annual_mean

This method will calculate the maximum value in each available year and for each grid cell of dataset.

In [5]:
data = nc.open_data("sst.mon.mean.nc")
data.annual_mean()

annual_min

This method will calculate the minimum value in each available year and for each grid cell of dataset.

In [6]:
data = nc.open_data("sst.mon.mean.nc")
data.annual_min()

annual_range

This method will calculate the range of values in each available year and for each grid cell of dataset.

In [7]:
data = nc.open_data("sst.mon.mean.nc")
data.annual_range()
data.select_years(2000)
data.plot()
Out[7]:

bottom

TBC

bottom_mask

TBC

cdo_command

This method let's you run a cdo command. CDO commands are generally of the form "cdo {command} infile outfile". cdo_command therefore only requires the command portion of this. If we wanted to run the following CDO command

cdo -timmean -selmon,4 infile outfile

we would do the following:

In [8]:
data = nc.open_data("sst.mon.mean.nc")
data.cdo_command("-timmean -selmon,4")
data.plot()
Out[8]:

cell_areas

This method either adds the areas of each grid cell to the dataset or converts the dataset to a new dataset showing only the grid cell areas. If we wanted to see the grid cell areas of the SST dataset, we would do the following:

In [9]:
data = nc.open_data("sst.mon.mean.nc")
data.cell_areas(join=False)
data.plot()
Out[9]:

clip

This method will clip a region to a specified longitude and latitude box. For example, if we wanted to clip a dataset to the North Atlantic, we could do this:

In [10]:
data = nc.open_data("sst.mon.mean.nc")
data.clip(lon = [-80, 20], lat = [40, 70])
data.select_timestep(0)
data.plot()
Out[10]:

compare_all

This method let's us compare all variables in a dataset with a constant. If we wanted to identify all regions with sea surface temperature exceeding 20 C in June 2000, we could do the following:

In [11]:
data = nc.open_data("sst.mon.mean.nc")
data.select_years(2000)
data.select_months(6)
data.compare_all(">20")
data.plot()
Out[11]:

cor_space

This method calculates the correlation coefficients between two variables in space for each time step. We can illustrate, in a rather boring way, by working out the spatial correlation coefficient between sea surface temperature in C and K.

In [12]:
data = nc.open_data("sst.mon.mean.nc")
data.mutate({"sst_k":"sst+273.15"})
data.cor_space("sst", "sst_k")
data.plot()
Out[12]:

cor_time

This method calculates the correlation coefficients between two variables in time for each grid cell. We can illustrate, in a rather boring way, by working out the temporal correlation coefficient between sea surface temperature in C and K.

In [13]:
data = nc.open_data("sst.mon.mean.nc")
data.mutate({"sst_k":"sst+273.15"})
data.cor_time("sst", "sst_k")
data.plot()
Out[13]:

cum_sum

TBC

daily_max_climatology

TBC

daily_mean_climatology

TBC

daily_min_climatology

TBC

daily_range_climatology

TBC

divide

This method will divide a dataset by a constant, or the values in another dataset of NetCDF file. We can illustrate this by dividing a dataset by itself, and finding the resulting values are 1.

In [20]:
data = nc.open_data("sst.mon.mean.nc")
data.select_timestep(0)
data.divide(data)
data.plot()
Out[20]:

ensemble_max

TBC

ensemble_min

TBC

ensemble_min

TBC

ensemble_max

TBC

invert_levels

This method will invert the vertical levels of a dataset.

mask_box

This method will set everything outside a specificied longitude/latitude box to NA. The code below illustrates how to mask the North Atlantic in the SST dataset.

In [14]:
data = nc.open_data("sst.mon.mean.nc")
data.select_timestep(0)
data.mask_box(lon = [-80, 20], lat = [40, 70])
data.plot()
Out[14]:

max

This method will calculate the maximum value of all variables in all grid cells. If we wanted to calculate the maximum observed monthly sea surface temperature in the SST dataset we would do the following:

In [15]:
data = nc.open_data("sst.mon.mean.nc")
data.max()
data.plot()
Out[15]:

mean

This method will calculate the mean value of all variables in all grid cells. If we wanted to calculate the maximum observed monthly sea surface temperature in the SST dataset we would do the following:

In [17]:
data = nc.open_data("sst.mon.mean.nc")
data.mean()
data.plot()
Out[17]:

merge

TBC

merge_time

TBC

max

This method will calculate the maximum value of all variables in all grid cells. If we wanted to calculate the maximum observed monthly sea surface temperature in the SST dataset we would do the following:

In [18]:
data = nc.open_data("sst.mon.mean.nc")
data.max()
data.plot()
Out[18]:

mean

This method will calculate the mean value of all variables in all grid cells. If we wanted to calculate the mean observed monthly sea surface temperature in the SST dataset we would do the following:

In [19]:
data = nc.open_data("sst.mon.mean.nc")
data.mean()
data.plot()
Out[19]:

monthly_anomaly

TBC

monthly_max

TBC

monthly_max_climatology

In [ ]:
This method will calculate, for each month, the maximum value of each variable over all time steps.
In [21]:
data = nc.open_data("sst.mon.mean.nc")
data.select_years(range(1990, 1999))
data.monthly_max_climatology()
data.select_months(1)
data.plot()
Out[21]:

monthly_mean

This method will calculate the mean value of each variable in each month of a dataset. Note that this is calculated for each year. See monthly_mean_climatology if you want to calculate a climatological monthly mean.

monthly_mean_climatology

TBC

monthly_min

This method will calculate the minimum value of each variable in each month of a dataset. Note that this is calculated for each year. See monthly_min_climatology if you want to calculate a climatological monthly minimum.

monthly_min_climatology

TBC

monthly_range

TBC

monthly_range_climatology

TBC

multiply

This method will multiply a dataset by a constant, another dataset or a NetCDF file. If multiplied by a dataset or NetCDF file, the dataset must have the same grid and can only have one variable.

mutate

This method can be used to generate new variables using arithmetic expressions. New variables are added to the dataset. The method requires a dictionary, where the key-value pairs are the new variables and expression required to generate it.

If we wanted to add a new variable in the SST dataset showing SST in Kelvin, we could do the following:

In [23]:
data = nc.open_data("sst.mon.mean.nc")
data.mutate({"sst_k":"sst+273.15"})
In [ ]:
 
In [25]:
data = nc.open_data("sst.mon.mean.nc")
data.mutate({"sst_k":"sst*(sst>20)"})
data.plot()
Out[25]:

nco_command

TBC

percentile

This method will calculate a given percentile for each variable and grid cell. This will calculate the percentile using all available timesteps.

We can calculate the 75th percentile of sea surface temperature as follows:

In [26]:
data = nc.open_data("sst.mon.mean.nc")
data.percentile(75)
data.plot()
Out[26]:

phenology

TBC

plot

This method will plot the contents of a dataset. It will either show a map or a time series, depending on the data type. While it should work on at least 90% of NetCDF data, there are some data types that remain incompatible, but will be added to nctoolkit over time.

range

This method calculates the range for all variables in each grid cell across all steps.

We can calculate the range of sea surface temperatures in the SST dataset as follows:

In [27]:
data = nc.open_data("sst.mon.mean.nc")
data.range()
data.plot()
Out[27]:

reduce_dims

TBC

reduce_grid

TBC

regrid

This method will remap a dataset to a new grid. This grid must be either a pandas data frame, an xarray object, a NetCDF file or a single file nctoolkit dataset. We can illustrate this method be regridding the SST dataset to the North Atlantic.

remove_variables

TBC

rename

TBC

rolling_max

TBC

rolling_mean

TBC

rolling_min

TBC

rolling_range

TBC

rolling_sum

TBC

run

TBC

seasonal_max

TBC

seasonal_max_climatology

TBC

seasonal_mean

TBC

seasonal_mean_climatology

TBC

seasonal_min

TBC

seasonal_min_climatology

TBC

seasonal_range

TBC

seasonal_range_climatology

TBC

select_months

TBC

select_season

TBC

select_timestep

TBC

select_variables

TBC

select_years

TBC

set_date

TBC

set_longnames

TBC

set_missing

TBC

set_units

TBC

spatial_max

TBC

spatial_mean

TBC

spatial_min

TBC

spatial_percentile

TBC

spatial_range

TBC

spatial_sum

TBC

split

TBC

subtract

TBC

sum

TBC

sum_all

TBC

surface

TBC

time_interp

TBC

to_dataframe

TBC

to_lonlat

TBC

to_xarray

TBC

transmute

TBC

var

TBC

vertical_cum

TBC

vertical_interp

TBC

vertical_max

TBC

vertical_mean

TBC

vertical_min

TBC

vertical_range

TBC

vertical_sum

TBC

view

TBC

write_nc

TBC

zip

TBC